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ABSTRACT 


This  report  describes  the  work  and  results  of  a study  to  establish  the 
performance  of  existing  digital  image  processing  techniques  on  FLIR  imagery 
supplied  by  NVL.  The  image  processor  would  form  the  basis  for  an  automatic 
target  cueing  system  to  assist  the  human  operator  of  a sensor  system. 

The  study  consisted  of  a statistical  test,  performed  by  computer  simu- 
lation, including  training  and  test  phases.  The  target  classes  included 
truck,  tank,  and  AFC.  Initial  detection  of  targets  scored  in  the  90%  range. 
Depending  upon  image  quality,  the  classification  performance  was  in  the  60% 
to  80%  range.  Using  the  same  sensitivity  setting,  the  false  alarm  rate  was 
20%.  The  exact  setting,  trading  false  alarm  rate  for  detection  rate,  would 
depend  upon  the  mission  requirements. 
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It  was  noted  that  the  number  of  samples  was  very  limited,  in  view  of 
the  number  of  features  used.  Future  efforts  might  include  a larger  data  base. 
It  was  also  suggested  that  the  design  of  a compact  automatic  cueing  system 
breadboard  be  started  to  keep  pace  with  sensor  hardware  development. 


i ^ m 

i M iS 


i 

* 


* * 


* 4e 


*• 


i 

I 

I 

T 


TABLE  OF  CONTENTS 


FORWARD 

ABSTRACT 

1.0  INTRODUCTION 

2.0  DESCRIPTION  OF  WESTINGHOUSE  AUTOMATIC  CUEING  TECHNIQUES 

2.1  Preprocessor 

2.1.1  Gradient  Extraction 

2.1.2  Gradient  Maximizing 

2.1.3  Subset  Generation 

2.1.4  Blob  Detector 

2.1.5  Texture  Data 
2.2  Final  Processor 

2.2.1  Blobs  and  Groups 

2.2.2  Feature  Generation 

2.2.3  Recognition  Algorithm 

3.0  TEST  PROGRAM  USING  NVL  IMAGERY 

3.1  The  Data  Base 

3.2  Preparation  of  Imagery 

3.3  The  Image  Processing  Sequence 

3.4  Training  Program 

3.5  Scoring  of  Training  Set 

3.6  Test  Set  Performance 

3.7  Discussion 

4.0  CONCLUSIONS  AND  RECOMMENDATIONS 

5.0  REFERENCES 


ill 


1 


Page  \ 

I 

1 

j 

1-1  1 

£ 

l 

2-1  ] 

2-3 

2-3  | 

2-5 

i 

2-5 

2-8 

2-10 

2-10 

2-12 

2- 14 
2-18 

3- 1 
3-1 
3-4 
3-24 
3-27 
3-31 
3-41 

3- 52 

4- 1 

5- 1 


li 


I _ 

. ’*  * 

LIST  OF  ILLUSTRATIONS 


• « 

Figure  No. 

Title 

Page  No. 

* - 

2-1 

Automatic  Cueing  System 

2-2 

s * 

2-2 

Steps  in  Automatic  Cueing 

2-2 

•«.  • 

2-3 

Digital  Image  Processor 

2-4 

2-4 

Gradient  Extraction  Process 

2-6 

, V 

2-5 

Steps  in  the  Preprocessing  of  an  Image 

2-7 

i V 

2-6 

Block  Diagram  - Blob  Detector  and  Subnet  Generator 
Operational  Cycle 

2-9 

2-7 

Final  Processor 

2-11 

f K 

2-8 

Blob  Merging 

2-13 

2-9 

Significance  of  Polarities  Between  Subsets 

2-14 

2-10 

Recognition  Algorithm  - Block  Diagram 

i 

2-19 

2-11 

Regional  Boundary  Sets  B G,  K,  L 

2-22 

2-12 

Classification  Logic  Flow 

2-23 

3-1 

Sketch  of  Targets 

3-3 

3-2 

A 50  x 50  Window  Extracted  from  an  Image 

3-6 

3-3 

Photo  Playbacks  of  50  x 50  Images 

3-12  thru  3-23 

3-4 

Two  Examples  of  Degraded  Samples 

3-25 

3-5 

Data  Flow  Through  Simulated  Processor 

3-28 

3-6(a) 

Training  Set  Scatterplots 

3-32 

3-6 (b) 

Training  Set  Scatterplots 

3-33 

3-6 (c) 

Training  Set  Scatterplots 

3-34 

3-7 

Extraneous  White  Line  in  Image 

3-40 

i«i)irto%Tiiriii  firth *>r 


Table  No. 


3-II1 


LIST  OF  TABLES 


Title 


False  Alarm  Rejection  Criteria 

Tie-Breaking  Rules 

List  of  Digital  Tapes  Available 


List  of  Data  Windows 


Number  of  Samples 

Training  Set  Results  - Raw  Scores 

Training  Set  Results  - Percentages 


Test  Set  Results  - Raw  Scores 


3-8  thru  3-11 


3-VII 


3 -VIII 


Test  Set  Results  - Percentages 
Training  and  Test  Results  - By  Window 


3-45  thru  3-48 


Sum  of  Test  and  Training  Results 
Interpreter  Performance  in  Recognition 


: , <? 


§~-|  *HI 


! 


1.0  INTRODUCTION 


! 


■;  | 
f : 


i 


4 «r 


I 


> i 


5 i 


I 

I 


The  major  business  activity  of  the  Westinghouse  Systems  Development 
Division  consists  of  the  development  of  sophisticated  sensor  systems  for 
military  requirements.  The  programs  cover  radar,  IR  and  visual  frequencies. 

In  1965,  pattern  recognition  research  vas  initiated  within  the  Division  to 
support  these  sensor  programs.  The  objective  of  this  research  was  the 
development  of  digital  image  processing  and  automatic  recognition  techniques 
and  systems. 

By  1970,  a specific  approach  had  been  established  for  the  extraction 
of  useful  Information,  such  as  target  location  and  Identity,  from  remote 
sensor-  images.  The  approach  consists  of  the  serial  preprocessing  of  the 
digitized  image  samples,  on  a line-by-line  basis,  so  as  to  extract  certain 
key  image  features,  and  to  reduce  the  data  bandwidth  by  orders  of  magnitude. 

The  results  of  the  preprocessing  operation  are  then  operated  on  by  a general- 
purpose  processor,  to  locate  and  classify  targets,  or  to  perform  map-matching 
between  similar  terrain  images.  The  Westinghouse  techniques  for  digital  image 
processing  are  described  in  Section  2.0. 

At  about  the  same  time,  military  laboratories  began  to  support  this 
research  for  the  specific  application  to  the  problem  of  "automatic  target  j 

cueing".  We  might  define  automatic  cueing  as  the  use  of  automatic  xecogninlon 
devices  to  initiate  appropriate  audible  or  visual  signals  (cues)  to  assist 
the  human  interpreter  in  his  evaluation  of  sensor  images.  The  cueing  system 
acts  as  an  information  filter  on  the  sensor  data,  by  selecting  Important  events, 
by  providing  an  audible  alarm  to  attract  the  attention  of  the  operator,  and 
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then  by  providing  visual  indications  of  the  target  location  and  identity  on  1 

i 

his  display.  In  1971,  Westinghouse  began  automatic  cueing  studies  with  the 

I 

Naval  Air  Systems  Command  (Reference  1),  with  the  Air  Force  Rome  Air  Development  ‘j 

Center  (Reference  2),  and  with  the  Army's  Frankford  Arsenal  (Reference  3).  In 
the  latter  program,  a real-time  demonstration  breadboard  cueing  system  was 
constructed,  which  is  presently  being  tested  with  video-taped  flight  data. 

i 

In  general,  the  results  of  these  programs  are  very  promising  when  com-  -■ 

\ 

pared  with  available  performance  data  for  human  operators  under  realistic 

circumstances.  It  appears  quite  possible  that  the  target  acquisition  per-  \ 

1 

formance  of  a helicopter  pilot,  for  example,  might  be  doubled  with  the  use 

4 

of  automatic  cueing  devices.  | 

i 

i 

In  February,  1975,  a presentation  on  Westinghouse  cueing  techniques  was 
made  to  Mr.  John  Dehne  and  Dr.  James  Tegnelia  of  NVL.  Following  that  meeting, 

Mr.  Dehne  indicated  that  NVL  was  preparing  a data  base  of  digitized  Images 
for  an  875-line  TV  compatible  FLIR  sensor.  He  expressed  an  interest  in  the 
performance  of  the  Westinghouse  technique*  on  this  data  base.  The  program 

■I 

described  in  this  report  provides  an  answer  to  that  question. 

i 

The  description  of  the  techniques  in  Section  2.0  is  followed  by  a 
detailed  discussion  of  the  test  program,  using  the  imagery  supplied  by  NVL, 
in  Section  3.0.  Conclusions  and  recommendations  are  contained  in  Section  4.0, 
and  References  in  Section  5.0. 
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2.0  DESCRIPTION  OF  WESTINGHOUSE  AUTOMATIC  CUEING  TECHNIQUES 


We  define  "automatic  cueing"  as  the  use  of  automatic  recognition  devices 
to  initiate  appropriate  audible  or  visual  signals  or  cues  to  assist  the  human 
interpreter  in  his  evaluation  of  sensor  images.  As  shown  by  Figure  2-1,  the 
cueing  system  acts  as  an  information  filter  on  the  sensor  data,  selecting 

images  of  importance,  marking  them  with,  visual  indications  of  target  location, 
and  providing  audible  alarms  to  attract  the  attention  of  the  interpreter. 

The  sequences  of  operations  carried  out  by  an  automatic  cueing  system 
is  shown  by  Figure  2-2.  The  operations  are  performed  over  the  entire  image, 
although  the  figure  examines  only  a small  window  of  the  FLIR  display  shown 
at  the  top.  First;  the  image  is  digitized  for  use  by  the  image  processor. 
Preprocessing  of  the  digitized  data  serves  to  reduce  its  bandwidth  by 
retaining  only  the  information  necessary  for  automatic  recognition.  When 
recognition  of  desired  targets  has  been  accomplished,  appropriate  audible 
and  visual  cues  are  initiated.  These  cues  will  not  only  identify  the  target 
types,  within  the  limitations  of  sensor  resolution,  but  can  also  provide 
precise  coordinates  of  their  location  in  the  image.  A variety  of  target  types 
can  be  accommodated  simultaneously  by  the  cueing  system. 


The  core  of  the  cueing  system  is  the  digital  image  processor.  It  is 
a hybrid  system  utilizing  a high  speed  hardwired  preprocessor,  followed  by  a 
programmable  processor  (general-purpose  computer)  to  generate  features  and 
employ  the  recognition  logic.  The  preprocessor  is  provided  as  special- 
purpose  hardware  in  order  to  achieve  a high  data  input  rate.  The  output  data 
rate  is  greatly  reduced  (by  at  least  a factor  of  five),  permitting  the 
flexibility  available  in  a slower  speed  programmable  processor  for  final 
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target  decisions.  A block  diagram  of  the  image  processor  is  shown  by  Figure 
2-3.  It  should  be  noted  that  the  image  processor  is  a tvo-diaenslonal  pro- 
cessor. The  preprocessor  contains  4 sets  of  single-line  storege  that  "wrap 
around"  to  permit  two-dimensional  operations.  Operation  in  both  dimensions 
simultaneously  provides  greater  noise  rejection  and  a better  match  to  the 
signal's  behavior  than  one-dimensional  operations. 

2.1  Preprocessor 

The  function  of  the  preprocessor  is  to  extract  from  the  gray  level 
image  the  information  required  for  generating  recognition  features.  Three 
types  of  data  are  extracted.  The  primary  data  are  the  straight-line  contours 
of  gray-level  gradient.  Thus  a line-drawing  of  the  video  image  is  generated. 
The  second  type  of  data  are  positional  cues  of  gray-level  closed  objects 
(or  "blobs") . The  location  of  a blob  generates  a window  within  which  recog- 
nition features  are  generated.  The  final  set  of  data  are  statistical  param- 
eters computed  during  the  preprocessing  which  may  be  used  in  texture  classifi- 
cation. 

Operation  of  the  preprocessor  is  on  a line-by-line  basis  with  respect 
to  the  input  image.  Therefore,  video  data  may  be  handled  directly.  Further- 
more, storage  requirements  in  the  preprocessor  are  limited  to  single  lines 
of  data  only. 

2.1.1  Gradient  Extraction 

To  generate  the  straight-line  contours  (subsets)  of  the  image,  it  is 
necessary  to  first  compute  the  two-dimensional  gradient  at  each  image  point. 
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Figure  2-3.  Digital  Image  Processor 
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This  is  done  as  shown  in  Figure  2-4  with  a four-pixel  window  scanning  across 
the  image  in  a raster  format.  The  gradient  amplitude  and  angle  are  approxi- 
mated as  shown.  The  gradient  direction  is  quantized  to  16  discrete  directions, 
as  depicted  in  the  diagram.  To  suppress  the  areas  of  negligible  gradient 
activity  (containing  no  significant  contour  or  edge  information) , a threshold 
is  applied  to  the  gradient  amplitude.  Figures  2-5(a)  and  2-5(b)  show  a gray 
level  ir  .gp  and  its  computed  gradient.  This  is  a FLIR  image  of  a small  truck.  The 
gradient  image  has  been  thresholded  and  displays  gradient  direction,  with  the 
directions  10  through  16  coded  with  an  overprinted  slash  /. 

2.1.2  Gradient  Maximizing 

After  gradient  thresholding  the  edges  are  generally  still  too  wide  for 
subset  generation.  Therefore  a gradient  thinning  operation  is  performed. 

The  operation  basically  "skeletonizes"  adjacent  colinear  gradient  directions 
to  the  peak  or  maximum  points. 

The  algorithm  utilizes  a raster  scanning  window  containing  a gradient 
cell  "X"  and  4 of  its  nearest  neighbors.  The  scanning  window  is  depicted  at 
the  top  of  Figure  2-6.  The  neighbors  with  colinear  gradients  are  compared  to 
"X".  The  largest  gradient  is  then  retained.  This  procedure  is  repeated  se- 
quentially for  each  gradient  point  in  the  image.  An  example  of  the  maximized 
gradient  is  shown  in  Figure  2-5 (c) . 

2.1.3  Subset  Generation 

Subset  generation  is  accomplished  by  "growing"  a line  formed  by  adjacent 
parallel  gradients.  As  before,  a 5-cell  scanning  window  is  employed.  The  new 
data  point  is  labeled  cell  "X".  Its  four  neighbors  are  examined  (sequentially: 

A,  B,  C,  then  D)  to 
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Figure  2-4.  Gradient  Extraction  Process 
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Figure  2-5.  Steps  in  the  Preprocessing  of  an  Image 


find  those  containing  a parallel  (within  a tolerance)  gradient  direction. 

If  one  is  found,  then  "X"  is  added  as  the  next  point  in  the  line  from  the 
neighbor.  Neighbors  that  are  colinear  to  the  gradient  of  "X"  are  excluded 
to  prevent  false  lines  from  forming.  The  operational  cycle  of  the  subset 
generator  is  diagramed  in  Figure  2-6.  An  example  of  the  subsets  derived 
from  a gray  level  image  is  shown  in  Figure  2-5 (e) . 

2.1.4  Blob  Detector 

The  blob  detector  detects  the  presence  of  a contiguous  area  of  gray 
levels  lighter  (or  darker)  than  its  surrounding  background.  It  operates 
independently  of  size,  orientation,  and  position,  and  will  detect  all  but 
sharp , concave  shapes . 

The  operation  of  the  blob  detector  is  similar  to  that  of  the  subset 
generator.  The  Input  data  is  the  output  of  the  gradient  stage.  Basically, 
the  blob  detector  seeks  to  trace  paths  along  contiguous,  slowly  changing 
gradient  directions.  Bookkeeping  counters  for  each  path  being  traced  keep 
track  of  the  gradient  at  the  start  of  the  path.  When  two  paths  from  the  same 
starting  gradient  join,  a blob  detection  occurs.  Additional  bookkeeping 
counters  measure  the  maximum  and  minimum  X and  Y excursions,  providing  a 
measure  of  the  blob's  size. 

Figure  2-6  depicts  the  operational  cycle  of  the  blob  detector.  It 
uses  the  basic  5-cell  window  scanning  the  gradient  image.  Each  of  the  4 
neighbors  of  the  X-pixel  is  examined  to  determine  if  X should  be  added  as 
the  next  point  in  a blob  tracing  path.  Figure  2-5 (d)  displays  the  paths 
being  traced  out  from  the  gradient  image,  Figure  2-5 (b).  The  numbers 
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printed  out  in  the  figure  indicate  the  coded  hits  that  keep  track  of  the 
start  of  each  path.  B3  'b  detection  is  coded  as  a pair  of  Brs.  The  output 
of  the  blob  detector  consists  of  the  blob  polarity,  center  position,  and 
horizontal  and  vertical  dimensions.  This  data  permits  the  object  to  be 
isolated  for  feature  extraction. 

2.1.  5 Texture  Data 

The  third  preprocessor  function  is  the  collection  of  statistical  data 
for  texture  classification.  The  gray  level  image  area  is  divided  into  windows 
of  30  x 30  pixels  for  statistical  data  collection.  The  average  gray  level 
and  average  gradient  amplitude  is  computed.  A limited  histogram  of  the 
gradients  is  accumulated;  i.e.,  the  number  of  pixels  with  gradient  amplitude 
equal  to  zero,  one,  two,  and  three.  Also,  two  additional  parameters  are 
computed:  (1)  the  number  of  pixels  with  gray  level  > a,  and  (2),  the  number 

of  pixels  with  gray  level  b.  The  subset  generator  provides  two  statistics: 
(1)  the  number  of  subsets  per  window,  and  (2),  the  number  of  "long"  subsets. 

This  study,  however,  concentrated  on  the  training  and  testing  of  the 
target  recognition  algorithms,  not  so  much  on  texture  analysis.  The  texture 
statistics  were  generated  during  the  study,  but  were  not  classified  or  utilized. 

2.2  Final  Processor 

The  final  processing  of  the  d<±ta  is  accomplished  in  a programmable  pro- 
cessor (general-purpose  computer).  Its  task  is  to  generate  the  recognition 
features  and  perform  the  target  decision  logic.  A block  diagram  of  the  final 
processor  is  shown  In  Figure  2-7. 
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2,2.1  Blobs  and  Groups 


To  reduce  storage  and  speed  requirements  and  to  reduce  background 
interference,  recognition  features  are  not  computed  for  the  en-.ire  image. 
Instead,  they  are  initially  computed  only  within  local  rectangular  ureas 
whose  positions  are  designated  by  the  blob  detector.  Therefore,  the  blob 
detector  in  effect  "cues"  the  processor  to  a local  area  containing  a possible 
target.  However,  for  those  targets  having  complex  shapes,  such  as  aircraft, 
cues  are  also  initiated  by  the  presence  of  a "starter"  subset.  A starter 
subset  is  defined  as  one  whose  length  exceeds  a predetermined  value  (e.g., 

I > 5) . For  each  starter  subset  within  the  image,  a square  area  (or  window) 
centered  on  the  subset  is  also  used  as  a positional  cue  for  the  processor. 

The  blob  and  long  subset  windows  are  used  to  collect  groups  of  subsets,  as 
will  be  discussed  later. 


As  seen  from  Figure  2-7,  the  first  function  performed  by  the  processor 
is  blob  merging.  Under  certain  conditions  a single  target  can  give  rise  to 
multiple  (usually  no  more  than  two)  blob  detections  that  overlap,  Thereforr, 
the  blob  list  in  the  preprocessor  buffer  stage  is  scanned  for  blobs  witn 
overlapping  areas.  Overlapping  blobs  are  merged  into  a single  new  blob 
whose  area  will  enclose  the  union  of  the  original  blob  areas.  See  Figure  2-8. 


Following  blob  merging  a search  is  made  for  several  different  "associations". 
In  general,  an  "association"  means  that  an  element  (e.g.,  blob)  is  within  a 
specified  distance  from  another  element.  An  association  of  long  subsets  with 
other  long  subsets  is  a significant  association.  These  pairings  may  later  be 
screened  to  detect  the  presence  of  roads.  Also  the  association  of  blobs  with 
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Figure  2-8.  Blob  Merging 

long  subsets  is  examined.  Subsequently,  these  long  subsets  are  prevented 
from  being  used  to  collect  a subset  group,  since  the  blob  is  usually  a more 
accurate  cue. 

When  the  associations  have  been  made  the  process  of  group  forming  starts. 
Each  blob  or  long  subset  defines  a window.  For  each  window,  all  subsets  are 
screened  by  X-Y  position.  All  the  subsets  falling  within  the  window  are 
defined  as  the  group  for  that  window. 

Further  screening  of  the  groups  is  done  to  eliminate  subsets  not 
belonging  to  a candidate  target  area.  It  should  be  noted  that  the  gradients 
of  the  subsets  belonging  to  any  dark  (light)  object  point  inwards  (outwards), 
with  few  exceptions.  See  Figure  2-9 (a). 
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Figure  2-9.  Significance  of  Polarities  Between  Subsets 

Subset  pairs  with  non-opposing  (inconsistent)  polarities,  as  in  Figure  2-9(b), 
do  not  usually  belong  to  an  object,  but  are  merely  background  clutter.  There- 
fore, long  subset  groups  are  screened  of  any  subsets  with  polarities  incon- 
sistent with  the  long  subset  defining  the  group.  Blob  groups  are  screened 
of  a\.y  subsets  with  polarities  inconsistent  with  the  blob  color,  and  relative 
to  its  center. 

2.2.2  Feature  Generation 

The  performance  of  a recognition  system  ultimately  depends  on  the 
choice  of  measurements  or  features  representing  the  target  which  are  used 
by  the  decision  logic.  Because  of  its  programmable  nature,  the  final  pro- 
cessor can  be  readily  modified  as  regards  both  the  target  complement  and 
their  associated  feature  sets. 
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The  training  phase  of  this  study  resulted  in  the  selection  of  11  types 
of  features  to  be  calculated  for  each  blob  or  long  subset  group  (i.e.,  candidate 
object).  These  will  now  be  described. 

(1)  Dimensions: 

The  vertical  extent  aY  of  an  object  is  output  from  the  blob  detector 
or  computed  from  the  long  subset  group.  The  horizontal  extent  A X is 
also  computed. 

(2)  Aspect  Ratio: 

The  aspect  ratio  is  defined  as  AX/  a Y. 

(3)  Number  for  Further  Processing  - N.F.P.: 

As  previously  stated,  the  subsets  in  each  group  are  screened  for  polarity. 
In  addition,  the  remaining  subsets  are  designated  as  belonging  to  the 
top  or  to  the  bottom  of  the  group.  The  designation  is  based  upon  the 
orientation  /.nd  polarity  direction  of  the  subset.  This  sorting  ef- 
fectively separates  the  object  into  a bottom  half  and  a top  half.  In 
the  process.  If  any  subset’s  midpoint  physically  occurs  in  the  opposite 
half,  it  is  thrown  away.  The  number  of  subsets  remaining  at  this  point 
is  called  the  N.F.P.  count. 


(4)  Final  Active  Quadrant  Count  - NFACT 

The  subsets  are  also  sorted  info  a right  side  or  a left  side  based  on 
angle  and  polarity.  At  this  point,  each  subset  has  been  assigned  to 
one  of  four  quadrants.  The  number  of  quadrants  that  have  at  least 
one  subset  Is  the  NFACT  count. 

(5)  Length  Residue  aR: 

This  is  a feature  useful  for  separating  triangular  objects  from 
rectangular  objects.  It  is  an  approximation  of  the  total  length  of 
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non-parallel  subsets  within  a group.  Its  dimension  is  in  pixels.  A 
triangle  has  a positive  value,  a rectangle  Ircssf?!  has  zero  value, 
and  a triangle  v//7  has  a negative  value.  It  is  computed  as  follows 


For  each  quadrant  K 


«K 


N.. 


S S -d 
Zd 


AS 


1,  2,  3,  4 compute 


where  * number  of  subsets  in  quadrant  K 
S ■ the  subset's  length 

& ■ min.  /off -vertical  orientation  of  the  subset 

(^off-horizontal  orientation  of  the  subset 
d » distance  from  center  of  group. 

Then  /SR*  (R.  + R , u ) “ [K  * + R , ) 

\ left  right  / ^ left  right  / 

WP  top  bot.  bot. 


(6)  Closure: 

Closure  is  defined  as 


z s 

p 


where  5 * each  subset's  length 
P ■ 2 • (A  X + AY) 

(7)  KHOLE: 

Many  APC  targets  display  a black,  "hole"  from  the  rear  viewing  angle. 
This  feature  searches  for  this  property.  If  a subset  of  the  correct 
angle  and  polarity  is  found  In  the  top  half  of  an  object,  such  as  to 
be  the  top  part  of  the  "hole",  then  KHOLE  « 1.  Otherwise,  it  is  0. 


LONTOP: 

Many  of  che  tanks  at  long  range  displayed  a rather  long,  somewhat 
horizontal  cop.  ^Apparently  the  turret  was  not  very  hot.)  So  if 
a long,  nearly  horizontal  subset  forms  the  very  top  of  the  object, 
LONTOP  « 1.  Otherwise,  it  is  0. 

Corners  and  Notches: 

The  top  half  and  bottom  half  of  the  object  are  searched  separately 
for  the  presence  of  an  outside  corner  or  an  inside  corner  (notch) . 


See  the  drawing  below. 


Na+ch 


Those  nearest  90  are  printed  out.  We  thus  have  four  possibilities: 

C , C , N , N . 
top  bot  top*  bot 


(10)  Peak: 


To  discriminate  tall  column-like  tops  (or  bottoms)  from  low  broad  or 
flat  tops  (or  bottoms) , the  peak  calculation  is  made  for  the  subsets 
in  the  top  and  bottom  halves,  separately.  It  is  computed  as: 


X S/z 


x 100 


where  p - 2 


Q * subset’s  horizontal  angle 
S • subset  length 


Thus  the  shape 


has  negative  Peak,  while  has  positive  Peak 


values.  One  other  parameter  is  computed:  a Peak  * Peak  - Peak^c 


P» 
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(11)  Sym: 


To  measure  the  symmetry  of  the  top  (or  bottom)  of  an  object,  another 
calculation  is  made.  For  the  top  and  the  bottom  subsets,  compute: 


I Z ■ (x-x.) 

S!~’  ' (fflE 


x too 


where  2 


/ ?0‘-  6 \ , 
*'(  -vT)  f 


S and  5 , as  before, 

X *»  midpoint  of  the  object 


X * the  subset's  midpoint. 


Asymmetric  tops,  such  as 


will  have  a large  Sym  magnitude. 


Symmetric  tops  will  have  zero  values . Two  additional  parameters 


are  computed: 


Syr*  = ^{/5y~r„/  / W/} 


A SyM  = | ( \SyMrot\  - f Sy*str  /]  f 


2.2.3  Recognition  Algorithm 


As  indicated  earlier  in  Figure  2-7,  the  final  block,  in  the  processor 
is  the  recognition  algorithm.  A block  diagram  is  shown  in  Figure  2-10.  The 
features  for  each  blob  or  long  subset  group  have  now  been  computed.  The 
first  process  indicated  in  the  figure  is  the  screening  out  of  false  alarms, 
or  non— targets.  To  that  end,  two  stages  are  employed.  The  first  stage  uses 
the  False  Alarm  Rejection  Criteria  of  Tabl'  2-1.  A failure  in  any  of  the 
rules,  rejects  the  group  as  a false  alarm.  The  second  stage  is  a minimum 
acceptable  value  for  the  feature  Closure,  as  shown  in  Figure  2-10. 


Figure  2-10  Recognition  Algorithm  - Block  Diagram 


TABLE  2-1 


FALSE  ALARM  REJECTION  CRITERIA 


REJECT  A CANDIDATE  TARGET  AS  A FALSE  ALARM 
IF  IT  FAILS  ANY  FOLLOWING  TEST: 


1) 

2) 

3) 


4) 

5) 

6) 

7) 

8) 

9) 


NFACT  £ 3 
NFP  £ 3 
C-N  count 


Closure  A .8 
Jar/  > .19 

.4  £ Aspect  Ratio  A 4. 
.4  £ Aspect  Ratio  £ 3. 


Peak.  £ 30. 
Bot 


I Sym  I * 900 . 
1 Topi 

or  Bot 

3 £ a X £ 26 

3 A i Y £ 26 


(for  subset  groups,  only) 

(for  subset  groups,  only) 

A1  for  .37  A Closure  £ .6 

> 2 for  .6  < Closure  £ .7 

>3  for  .7  < Closure  £ .8 

(for  subset  groups,  only) 
(for  subset  groups,  only) 

(for  blob  groups) 

(for  subset  groups) 
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All  groups  (or  objects,  at  this  point)  remaining  are  considered  targets 
and  will  be  classified  into  one  of  the  three  target  classes.  A total  of  13 
features  are  used  to  classify  the  targets.  Normally,  though,  only  the  first 
11  are  used  in  classification;  the  remaining  2 are  added  for  tie-breaking 
cases.  Decision  boundaries  in  the  11-dimensional*  feature  space  were  es- 
tablished during  the  training  phase,  as  described  in  Section  3.4.  To  achieve 
an  early  estimate  of  performance,  a simplified  decision  space  was  utilized. 

As  shown  in  Figure  2-11,  nine  features  are  used  in  a pair-wise  manner  to 
yield  15  separate  classification  regions.  The  10th  and  11th  features  (KHOLE 
and  LONTOP)  provide  2 additional  classification  regions. 

The  first  step  in  classifying  a target  is  to  determine  the  region  (or 
regions)  that  contains  the  target’s  feature  pattern,  to  provide  a tentative 
class  decision(s) . As  shown  in  Figure  2-12,  the  next  step  is  to  take  a vote 
of  the  tentative  decisions.  Note  that  a NON-TRUCK  region  provides  one  TANK 
vote  and  one  APC  vote.  Similarly,  NON-APC  and  NON-TANK  provide  2-vote 
tentative  decisions. 

If  there  is  no  majority,  special  tie-breaking  rules  are  employed.  These 
are  tabulated  in  Table  2-II,  and  utilized  as  shown  in  Figure  2-12.  The  final 
classification  decision,  as  well  as  the  target's  coordinates  are  the  output 
data. 


♦However,  one  of  the  features,  APeak,  is  correlated  with  two  other  features, 

Peakrop,  and  PeakBor 
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igure  2-12  Classification  Logic  Flow 
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SET  A 


TABLE  2-II 
TIE-BREAKING  RULE& 


SET  T 


1: 

If 

a Peak  < -50 

decide 

APC 

2: 

If 

Peak  < Peak 

T°P  B« 

decide 

non-APC 

3: 

If 

N.F.P.  i 9 

decide 

APC 

T 

Is 

If 

A Peak  k 110 

decide 

non-True k 

2: 

If 

PeakTop>  10 

decide 

non-Truck 

3: 

If 

there  exists  N 

Bot 

decide 

non-Truck 

4: 

If 

A R > 0 

decide 

Truck 
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3.0  TEST  PROGRAM  USING  NVL  IMAGERY 

The  statistical  test  program  involved  the  collection  and  preparation  of 
the  data  base,  and  the  use  of  these  images  to  "train"  and  "test"  the  image 
processing  system  by  computer  simulation.  These  steps  will  be  described  in 
this  section. 

3.1  The  Data  Base 


The  data  base  was  supplied  by  NVL.  It  consists  of  14  magnetic  digital 
tapes  of  FLIR  data,  as  listed  in  Table  3—1.  The  images  had  been  digitized 
from  a TV— format  FLIR  system  via  video  tape  recordings.  Each  digitized  image 
is  800  x 1024  picture  elements  (pixels)  of  8-bit  gray  level  data.  Each 
digital  tape  file  contains  a separate  image.  Ground  truth  and  35-mm.  film 
transparencies  were  also  supplied  for  the  images. 

The  imagery  contains  target  and  a few  non-target  scenes , The  targets 
are:  an  M60A  tank,  M113  APC,  and  a 2^  ton  truck  (probably  M35  type).  Rough 

sketches  of  these  targets  are  shown  in  Figure  3-1.  Dimensional  information 
i3  also  given.  Probable  IR  "hot"  spots  are  located  by  the  signs.  A 
study  of  the  film  strips  that  were  provided  shows  that  frequently  at  longer 
ranges,  the  turret  of  the  tank  is  not  visible.  The  truck  has  a "cold"  area 
in  the  rear,  noticeable  at  close  range.  The  APC  also  has  a distinct  charac- 
teristic, at  close  range  and  rear  view.  It  has  a noticeable  black  hole  in 
the  middle,  where  the  door  is  located.  However,  at  moderate  and  longer  ranges 
the  targets  are  difficult  to  visually  recognize.  (This  will  be  discussed 
further  in  a later  section.) 
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Figure  3-1  Sketch  of  Targets 
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The  film  strips  also  reveal  that  many  of  the  images  are  seriously  degraded 
in  quality.  Vertical  stripes  and  ripple  are  present  on  the  left  side  of  images. 
Herringbone  and  Moir&  patterns,  and  ripple  appear  sporadically  over  the  field- 
of-view  of  many  images,  and  horizontal  black  and  white  streaks  occur  occasion- 
ally. Additionally,  the  resolution  appears  to  be  much  lower  than  the  pixel 
spacing.  These  distortions  will  be  considered  later. 

3.2  Preparation  of  Imagery 

As  the  digital  tapes  arrived  from  NVL,  they  were  copied  to  provide 
"working"  tapes  more  compatible  with  the  particular  tape  drives  at  Westing- 
house.  Copying  the  tapes  was  often  a difficult  task;  errors  were  frequently 
encountered.  On  the  average,  two  attempts  per  tape  had  to  be  made.  It  was 
also  discovered  that  Tape  ? contained  completely  unknown  data.  It  was  there- 
fore dropped  from  the  data  base. 


r • 
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The  second  step  towards  preparing  the  data  base  was  to  generate  a set 
of  sub-images  or  "windows".  The  existing  simulation  of  the  image  processing 
system  uses  images  of  size  50  pixels  by  50  pixels.  This  heretofore  provided 
a more  than  adequate  area  to  include  any  target  of  interest,  plus  some  back- 
ground clutter.  It  is  also  fast-running  in  the  simulation,  keeping  computer 
time  costs  at  a minimum.  To  hold  computer  costs  down  and  stay  compatible 
with  the  existing  software,  the  same  size  format  was  used  for  this  study. 
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A 50  x 50  pixel  window  was  created  for  each  target  in  each  image. 

Using  the  ground  truth  information  and  film  strips  that  were  supplied,  the 
coordinates  of  50  x 50  size  areas  containing  a target  were  tabulated.  Using 
a computer  subroutine,  the  gray  levels  of  these  areas  were  "lifted"  from  the 
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digital  tapes  and  copied  onto  another  magnetic  tape,  as  separate  files.  Figure 
3-2  shows  an  example  of  a window  containing  ar.  APC  lifted  (or  extracted)  from 
file  2 on  magnetic  tape  I-J  (ground  truth  image  L-10) . 

Some  of  the  targets  in  the  data  base  are  very  large  (e.g.,  > 100  pixels 
in  length).  To  fit  them  within  the  50  x 50  windows,  areas  containing  larger 
targets  were  digitally  shrunk  to  approximately  15-20  pixels  in  target  length. 
The  shrinking  was  done  by  averaging  a neighborhood  and  using  that  value  as  a 
single  new  pixel.  A 2:1  shrink,  for  example,  averages  a neighborhood  of  2 x 2 
pixels  to  obtain  a gray  level.  The  next  gray  level  comes  from  the  next  ad- 
jacent 2x2  neighborhood.  As  a consequence,  high  frequency  noise  is  reduced 
and  resolution  is  reduced.  However,  the  resolution  loss  was  considered  non- 
detrimental for  two  reasons.  First,  the  FLXR  sensor  data  had  been  oversampled 
in  deriving  the  digital  version  of  the  images.  Secondly,  the  present  target 
recognition  system  is  oriented  towards  operation  with  longer  range  targets  - 
thus  small  size  (10-30  pixels,  e.g.)  and  lower  resolution-on-target. 

The  result,  then,  of  those  images  that  were  shrunk  while  being  extracted  is 
a smaller,  somewhat  smoothed,  version  of  the  original. 

At  the  same  time  that  the  windows  were  being  extracted  from  the  data 
tapes,  the  lowest  2 bits  of  gray  level  data  were  dropped.  The  digital  image 
processing  system  requires  only  5 or  6 bits  of  data,  and  it  was  estimated 
that  the  image  data  provided  did  not  contain  any  significant  target  or  scene 
information  in  the  lowest  2 bits.  So  only  the  upper  6 bits  were  retained. 

After  the  50  x 50  windows  were  written  on  magnetic  tape,  they  were 
photographically  played  back  for  visual  inspection  and  verification.  The 
playback  photos  revealed  that  Tapes  I-J  and  J-K  did  not  coincide  with  their 
expected  film  strip  images.  Tapes  I-J  and  J-K  had  to  be  reformatted  and 
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recopied  to  make  tapes  compatible  with  the  photo  playback,  system.  Playbacks 
of  these  tapes  showed  that  Tape  I-J  contained  images  L-9,  L-10,  J-7  through 
J-10,  and  L-l  through  L-9,  in  that  order.  Tape  J-K  contained,  consecutively, 
Images  J-l  through  J-10,  and  k-1  through  K-5  (instead  c-  J-6  J10,  K-l  -*■ 

K-10  as  expected) . 

A total  of  1005  windows  were  extracted.  Approximately  240  contain 
targets  (some  are  unknown  in  ground  truth) ; the  remaining  windows  contain  no 
targets  and  are  used  for  false  alarm  testing.  Table  3-11  lists  all  of  the 
windows  and  targets,  along  with  some  diagnostic  and  ground  truth  data.  The 
last  page  of  Table  3-II  lists  the  sources  of  most  of  the  non-target  windows.* 
The  "MULX"  and  "MULY"  columns  indicate  that  a whole  set  of  adjacent  50  x 50 
windows  were  extracted  from  one  image.  For  example,  the  last  entry  indicates 
that  320  windows  (20  across  by  16  down)  were  listed  from  image  D-9  and  were 
labeled  window  number  686  through  1005.  Figure  3-3  shows  photographic  play- 
backs of  all  1005  windows  comprising  the  data  base.  It  should  be  explained 
that  the  windows  that  appear  to  be  all  white  are  playbacks  of  windows  extracted 
from  images  with  reversed  (negative)  polarity.  Upon  extraction,  these  gray 
levels  were  complimented  for  polarity  correction,  however  no  d.c.  adjustment 
was  made  since  the  preprocessor  only  uses  the  gradient  information.  Un- 
fortunately, this  sometimes  caused  white  saturation  during  playback  (with 
the  brightness  and  contrast  set  up  for  normal  polarity  windows). 

The  statistical  test  requires  separate  sets  of  training  and  test  images. 
Therefore  the  windows  containing  targets  (as  verified  by  the  playbacks)  were 
split  about  equally  between  training  and  test.  An  attempt  was  made  to  alter- 
nate successive  images  of  a target  between  training  and  test,  so  that  the  two 
sets  would  contain  roughlj  similar  aspect  angles,  ranges,  etc.  for  each  target. 

*Some  of  the  images  did  contain  targets  and  had  already  been  extracted,  so 
those  50  x 50's  here  chat  contained  such  areas  were  excluded  from  auy 
further  usage. 
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TABLE  3 -II  LIST  OF  DATA  WINDOWS 


PICT. 


PICT* 


r*  t c r . 


PICT. 


NO, 


comments  on 


POL 


1 

TAPt 

A 

PJX 

I 

TANa/S 

L-80 

u 

t 

TAPE 

A 

PIX 

2 

tanx/s 

L-bO 

u 

TAPt 

A 

PIX 

3 

Tanx/S 

L-bO 

u 

n 

TAPt 

A 

P I X 

3 

2 1/2  TON/S 

L - a 0 

u 

» 

T Apt 

A 

PIX 

4 

TANX/S 

L-60 

u 

6 

TAPt 

A 

PJX 

1 

2 1/2  Ton/S 

L-bO 

u 

7 

TAPt 

A 

P I X 

b 

false  alarm 

L • A 0 

u 

0 

TAPt 

A 

P I x 

6 

tanx/s 

L-bO 

u 

y 

TAPt 

A 

PI* 

6 

2 1/2  TON/S 

L-10 

u 

\0 

TAPt 

A 

PIX 

7 

FALSE  ALARM 

BUbM 

0 

ll 

TAPt 

A 

PIX 

a 

tanx/s 

L-60 

0 

l* 

TAPE 

A 

PIX 

9 

TANX/S 

L-60 

u 

i«* 

TAPt 

A 

PIX 

9 

2 1/2  TON/S 

L-40 

u 

l? 

TAPt 

A 

PIX 

l OT  ANK.'S 

L-100 

0 

NO. 
15 
1 * 
IT 
IS 
IV 
2U 
21 
2 2 

23 

24 

25 

26 
21 
2S 

it 


COMMENTS  OR 
tape  B Pix 

TAPE  B ' 


PI* 


tape  b pi* 
tape  b pi* 


Tape  b pix 
Tape  b pix 


tape 

tape 

TAPE 


P I x 

PIX 
P I A 


Tape  b pix 
TAPE  B PIX 


TAPE  B Pix 
TAPE  3 PIX 


tape  b pjx 
tape  a pix 


description 
I TANX/S 

false  ALARM 

TANX/S 
FALSE  ALARM 
FALSE  ALARM 
TANK/S 
TANK/S 
Tank/s 
false  alarm 
FALSE  alarm  L-60 
TANK/S  L-60  70-80 

TANX/S 
2 5/2  TON  E 


L-SO 
i.  -40 
L *60 
L * 4 0 
L*80 
L * 3 0 
L * S 0 
L * 6 0 
L * b 0 


L-  80 
L-60 


Tape  a pix 


false  alarm  l-rs  negative 

9 FALSE  alarm  L-60  NEGATIVE 

10  FALSE  ALARM  L-60NEGAT I V£ 


Smx 
6 


6 

M 

A 

s 

1 

4 

A 

4 

4 

5 

s 

i 

a 


POL 

0 


$HK 

4 


MO. 

COMMENTS  or 

OCSCH IpTlON 

31 

TAPt 

C 

PIA 

I 

f P c / s 

L-So 

■>  > 

TAPt 

C 

P l A 

1 

T Af.N/C 

l-bc 

3i 

TAPE 

C 

f’l  X 

2 

APC/S 

l-60 

3a 

TAPt 

C 

P 1 A 

2 

TAMX/E. 

L-t>0 

35 

T aPE 

c 

Pix 

2 

2 U2  TON/E 

l«»q 

36 

T~Pl 

c 

p I X 

3 

APC/S 

L-Sf) 

NFGATI VE 

37 

TAPt 

c 

p 1 X 

3 

TAiiX/E 

L./O 

negative 

38 

TAPt 

c 

p 1 X 

3 

2 1/2  TGN/E 

L-»r. 

negative 

39 

TAPt 

c 

P I X 

4 

APC/S 

L-60 

40 

TAPt 

c 

Pix 

4 

T ANx/E, 

L-oO 

4 l 

TAPt 

c 

Pix 

4 

2 l//  Tcn/E 

L-ov 

42 

T a P E 

c 

PU 

s 

RETICLE 

A 3 

Ur'S 

c 

Pix 

6 

2 1/2  Ton/E 

L»20 

4*1 

tape 

c 

p I X 

6 

TANX/S 

L-60 

4 S 

TAPE 

c 

PIX 

7 

2 U2  ION 

L-30 

46 

tape 

c 

Pix 

•*? 

2 i/2  ton 

L-  20 

47 

TAPt 

c 

PIX 

6 

tanx/s 

L-SO 

46 

TAPE 

c 

Pix 

V 

TANX 

L-SO 

4 9 

TAPt 

c 

PU 

9 

APC 

L-2S 

SO 

TAPt 

c 

PIX 

10  TANX 

L-40 

SI 

TAPt 

c 

PU 

10  APC 

L-2S 

POL 

0 

0 

0 

0 

0 

1 

I 

I 

0 

0 

0 

0 

c 

0 

0 

0 

0 

c 

0 

0 

0 


SKK 

6 

4 


5 

s 

3 

5 

4 
4 

6 


o 

4 
2 

5 

3 
2 

4 
4 
2 
4 
2 


NO, 

COMMENTS  oh 

DESCRIPTION 

S2 

TAPE 

0 

P 1 X 

I 

2 i / ton 

L-25 

S3 

TAPE 

0 

PU 

1 

TANX/S 

L • 4 5 

5 4 

TAPE 

0 

P 1 X 

2 

2 i/2  ton 

L-20 

SS 

TAPE 

0 

P I X 

2 

TANX 

L • 2 C 

S 6 

TAPE 

D 

P I X 

2 

APC 

L - 2 0 

57 

T«PE 

0 

P I X 

3 

2 i/a  ton 

L-15 

Sd 

TAPt 

0 

p I X 

3 

TANX 

L-2S 

59 

TAPE 

0 

P I X 

3 

APC 

L-20 

60 

TmPE 

0 

PIX 

4 

2 1/2  Ton 

L-20 

61 

tape 

D 

P l X 

4 

TANX. 

L-25 

62 

TAPt 

0 

PU 

4 

APC 

L • 2 0 

63 

TaPE 

0 

PU 

S 

2 1/2  Ton 

L-20 

64 

TAPt 

0 

p I / 

5 

TANX 

L-2S 

6 S 

TAPE 

0 

P 1 X 

5 

APC 

L » I S 

66 

TAPE 

0 

P I X 

6 

no  target 

L • 2 0 TWO  FA 

67 

TAPE 

0 

P I X 

7 

APC/E 

L-40 

6 U 

TAPE 

0 

p I < 

8 

A P C / E 

L-50 

6 V 

TAPE 

0 

PIX 

3 

TANX/S 

L»  1 00 

70 

TAPE 

0 

PU 

9 

NO  TARGET 

L-40 

7S 

TAPE 

0 

PIX 

10  FALSE  ALARM  l-40 

72 

TAPE 

0 

PIX 

10  FALSE  ALAKM  L-20 

73 

TAPE 

0 

PIX 

10  TANK 

L-20 

POL  SHX 

0 2 

0 4 

0 I 


0 l 
0 2 
0 3 
0 3 
0 8 
0 4 
0 4 
0 2 
0 2 


TABLE  3-11  LIST  OF  DATA  WINDOWS 


COMMENTS  OS 
TAPE  E P ! X 
tape  e p i x 
TAPE  £ Ptx 
TAPE  £ PIX 
tape  e p [ x 
TAPE  £ P i X 
TAPE  E P J X 
TAPE  E PJX 
TAPE  E PIX 
TiP£  E PIX 
TAPE  £ PIX 
TAPE  E PIX 
TAPt  £ PIX 
TAPE  E PIX 
TAPE  £ PJX 
TAPE  E PIX 
TAPE  £ PiX 
TAPE  £ PJX 
TAPE  £ PJX 
TAPE  £ PIX 
TAPE  £ PIX 


DESCRIPTION 

1 2 1/2  TON/E  L • f 

2 APC/E  L«: 

2 TANK/j/H  E « I 

3 TANK/3/4  »_ « <; 

s pause  aearm  l«: 

4 APC/E  L-t 

5 APC/£  L • ‘ 

5 Tank/j/4  £.< 

6 APC/E  E « * 

6 TANK/j/4  U « < 

7 2 1/2  TON/E  L>< 

8 APC/E  L-! 

8 TANK  3/4  L»  ■ 

9 PAUSE  ALARM  L » 1 
V PALSE  ALARM  L-l 

10  FALSE  ALARM  L‘ 

10  PAUSE  ALARM  L' 
10  TANK  L' 

10  APC  L" 

10  FALSE  ALARM  L‘ 
10  FALSE  ALARM  L‘ 


NO. 

COMMENTS  OR  DESCRIPTION 

95 

TAPE 

F 

P 1 X 

1 

APC 

L*20 

96 

TAPE 

F 

PIX 

1 

T A N K / s 

L-30 

97 

TAPE 

F 

P I X 

1 

2 1/2 

U-lS 

98 

TAPE 

F 

PIX 

2 

APC 

L • 1 0 

99 

TAPE 

F 

P 1 X 

2 

tahk/s 

L-30 

00 

TAPE 

F 

P [ X 

2 

2 1/2 

L-  1 0 

01 

TAPE 

F 

PJX 

3 

APC 

L-ZO 

02 

TAPE 

F 

P 1 X 

3 

TAnk/S 

L • ZS 

0 3 

TAPE 

F 

P I X 

3 

2 1/2 

L-  |0 

04 

TAPE 

F 

Ptx 

4 

APC 

L * 2 0 

OS 

TAPE 

F 

P I X 

M 

Tank/S 

L * 2 S 

06 

TAPE 

F 

PIX 

4 

2 1/2 

L * 1 0 

07 

TAPE 

F 

Ptx 

S 

APC 

L-iS 

08 

TAPE 

F 

PIX 

s 

TAn</S 

L-ZO 

09 

TAPE 

F 

PIX 

5 

2 1/2 

L - 1 0 

10 

TAPE 

F 

PIX 

5 

FALSE  ALARM 

L-  10 

1 1 

TAPE 

F 

PIX 

6 

2 1 /2 

L - 1 5 

12 

TAPE 

F 

P|X 

6 

TANK 

L • 5 

18 

TAPE 

F 

PIX 

6 

APC  „ 

U-20 

14 

TAPE 

F 

PIX 

7 

2 1/2 

L-lS 

15 

TAPE 

F 

PIX 

7 

TANK 

L-  S 

1 6 

TAPE 

F 

Ptx 

7 

APC 

L-20 

17 

TAPE 

F 

PIX 

S 

2 1/2 

L-|S 

18 

TAPE 

F 

PIX 

3 

TANK 

L-  5 

19 

TAPE 

F 

PIX 

8 

APC 

L-20 

20 

TAPE 

F 

P IX 

9 

2 1 /2 

L-iS 

21 

TAPE 

F 

P 1 X 

9 

TANK 

L-  5 

22 

TAPE 

F 

PIX 

9 

APC 

L-20 

23 

TAPE 

F 

P I X 

9 

FALSE  ALARM 

L-40 

24 

TAPE 

F 

PIX 

9 

FALSE  ALARM 

f70 

2S 

TAPE 

F 

Ptx 

to 

APC/E 

NES  L«*0 

MO  , 

COMMENTS 

OR  OESCR I pT i ON 

1 2<> 

TAPE 

g-h 

P I X 

l 

arC/e 

E * 4 0 

12' 

TAPt 

G-M 

P 1 A 

l 

Tank  3/4 

L * 7 5 

1 Z« 

7 Apt 

G-H 

P I X 

2 

T INK/S 

L-  I 7 

1 7 9 

tape 

G-M 

P 1 X 

2 

F . A , 

L*  5 

130 

tape 

G-H 

PI  X 

3 

T A M N / S 

L»1  7 

131 

TAPt. 

G-il 

P 1 A 

3 

F .A, 

L-  5 

132 

T <>Pt 

G-ll 

P I A 

4 

TiNX/S 

L-20 

1 3 3 

7 APl 

G-M 

7’  l A 

4 

F.  A, 

L • 8 

134 

tapl 

G-M 

P I X 

5 

? 

L-  10 

13b 

7 APt 

G-M 

PiX 

5 

7 

E-  5 

13* 

7 APE 

G-rt 

PIX 

5 

Tank 

L-20 

1 3 7 

tape 

G-H 

PI  X 

6 

7 

L-  10 

l Jd 

TAPt 

G-M 

P I X 

6 

7 ■ 

L»  S 

13’ 

TAPt 

G-M 

P I X 

6 

TaN< 

L-20 

140 

TAPE 

G-M 

P I X 

7 

7 

L-lb 

14  l 

TAPE 

G-M 

P 1 X 

7 

7 

L-  b 

1 4 2 

TAPE 

G-M 

P I X 

7 

t»nk 

L-ZO 

143 

TAPt 

G — H 

P 1 A 

U 

7 

L-IO 

1 4 4 

TAPE 

G-M 

P 1 X 

8 

7 

L - b 

1 4 b 

Tape 

G-M 

P I X 

8 

Tank 

L-20 

1 44 

TAPt 

G-H 

P 1 X 

9 

7 , 

L • 1 0 

14/ 

TAPE 

G-H 

PIX 

9 

7 

L-  S 

l 4 8 

TAPt 

G-M 

P I X 

V 

Tank 

L-20 

149 

TAPt 

G-H 

PIX 

10 

F • A « 

L - 40 

ISO 

TAPE 

G-H 

r i x 

10 

F • A , 

L-xU 

151 

TAPE 

G-h 

P 1 X 

1 1 

T ROC*  3/4 

L«  10u 

152 

TAPt 

G-H 

P 1 X 

l 2 

TrOCK  3/4 

L-  l Go 

153 

TAPE 

G-M 

F 1 X 

13 

lPt/E 

L-40 

IS1* 

tape 

G-M 

fix 

1 3 

TaNa  3/4 

L * 7 0 

15b 

TAPE 

G-H 

P 1 X 

14 

a^C/E 

L -45 

15* 

tape 

G-M 

P 1 X 

14 

Tama  3/4 

L-oO 

15? 

tape 

G-H 

PIX 

IS 

aPC/E 

L-45 

15* 

TAPE 

G-H 

PIX 

IS 

Tank  3/4 

L-80 

TABLE  3-II  LIST  OF  DATA  WINDOWS 


tMCT*,^2*  TTPE  KUti  HU>-I  SEQUENCE  rOMKFNTS  OH  OFSCRtPTtON 

270  1°  8 l TAPE  A MIX  7 FALSE  ALARM  WINDOWS 


POjj  SHK 


MKT.  ^C.  TYPE  MULJ  molt  SEQUENCE  cOMMrl(TS  OR  OpSCRIPTidn 
35,0  9 8 l TAPE  A PjX  to  FAlSE  AL 


LARM  » t NDOWS 


POb  SH| 


P|CT»  NO.  TYPE  HU» X HULT  SEQUENCE  cOm«FNTS  OR  DESCRIPTION 

922  • • 1 TApE  B FIX  2 FALSE  ALARM  WINDOWS 


POjj  SHK 


P!CT>  NO.  TT|>r  ML'LX  mu,  T SEavJCUCi.  cONMffjrS  3R  OFSCR  ( r>  ( I n'J 

■48t  10  o • I TA.M  B P | X 9 F A EXCLUOE  TtiT 


POL  SHK 
1 2 


PICT.  NO.  TYPE  MU,  X MULT  SEQUENCE  cOwMrNTS  OR  OFSCRIPTioN 
544  10  8 t TApE  o FIX  * F.  A. 


POL  SHK 
0 2 


PICT.  NO.  TYPE  HULX  MULY  SEQUENCE  COHKENTS  OR  DESCRIPTION 
646  10  4 1 TAPE  E PIX  10  F.  A. 


POL  SHK 
0 2 


PICT.  NO.  TTPE  HULX  HULT  SEQUENCE  CONNENTS  OR  DESCRIPTION 
484  2“  IS  1 TAPE  0 PIX  9 F.  A. 


POL  SHK 
0 1 


Figure  3-3.  Photo  Playbacks  of  50  x 50  Images 
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Windows  containing  targets  of  unknown  ground  truth,  were  excluded.  Also, 
windows  from  Tape  N-0  often  did  not  contain  a target,  indicating  incorrect 
correspondence  with  the  film  strips  (as  happened  with  Tape  I-J) . These 
samples  were  naturally  excluded. 

During  the  various  examinations  of  the  window  playbacks,  it  was  observed 
that  there  were  many  seriously  degraded  target  images.  A quick  check  showed 
that  the  problem  was  the  ripple,  herringbone,  and  Moire  distortion  seen  in 
the  film  strips  and  discussed  earlier..  It  was  especially  apparent  in  the  50 
x 50  window  playbacks  that  were  not  shrunk  because  they  are,  in  effect,  a 
blown-up  version  of  the  original.  Two  examples  of  the  degraded  windows  are 
shown  in  Figure  3-4.  The  upper  50  x 50  window  contains  a tank  lifted  from 
image  L-2  (located  by  the  arrow  in  the  playback  of  L-2) . The  lower  50  x 50 
window  is  an  APC  lifted  from  image  L-3.  Samples  that  had  similar  serious 
degradation  were  keyed  for  later  reference. 

Finally,  Table  3-III  shows  the  number  of  samples  used  in  the  training 
and  test  sets.  The  degraded  samples  were  also  used,  but  are  tabulated 
separately  in  the  table.* 


3.3  The  Image  Processing  Sequence 


Before  proceeding  into  training,  a brief  review  of  the  processing  sequence 
is  in  order.  Figure  3-5  shows  the  flow  of  data  through  the  processor.  Sub- 
images of  size  50  x 50  pixels  are  lifted  from  the  original  scenes  of  size  800  x 


1Q24.  These  small  windows  are  written  on  another  digital  tape  and  photographically  | 
reconstructed.  1 


*The  totals  shown  in  Table  3-III  differ  slightly  from  those  proposed  in  the 
December  '75  Progress  Report.  The  difference  arises  because  later  examination 
of  the  newly  available  playbacks  indicated  some  missing  targets,  as  already 
mentioned . 
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Figure  3-4.  Two  Examples  of  Degraded  Samples 
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The  processing  itself  is  split  into  two  parts  - a preprocessor  "front- 
end",  and  a final  processor.,  The  preprocessor  has  an  optional  two-dimensional 
filter  at  the  input.  A single  threshold  is  the  only  adjustable  variable  in 
the  preprocessor.  The  level  for  this  minimum  gradient  threshold  was  determined 
and  set  permanently  prior  to  training.  Final  processing  consists  of  two  stages 
of  false  alarm  screening,  followed  by  a classification  stage. 

For  estimating  the  performance  of  the  system,  scores  were  taken  at  the 
points  indicated  by  the  arrows.  This  will  be  further  discussed  in  conjunction 
with  the  results  tables. 

3.4  Training  Program 

Since  the  digital  image  processing  is  split  into  two  parts,  it  was 
convenient  to  perform  the  simulation  and  analysis  of  the  training  samples  in 
two  corresponding  steps. 

It  was  first  necessary  to  select  the  amount  of  prefiltering  and  the 
level  of  minimum  gradient  in  the  preprocessor.  A small  set  of  windows  from 
the  training  set  were  preprocessed  using  three  different  degrees  of  filtering 
and  three  levels  of  minimum  gradient.  Plots  of  the  preprocessor  outputs 
' (subsets  and  blobs)  were  made  on  a Calcomp  Model  763  plotter  for  visual 

analysis.  It  was  evident  that  a 3 x 3 pixel  2-dimensional  filter  reduced  the 
edge  gradients  on  objects  excessively.  A 2 x 2 size  filter  was  not  excessive, 
yet  it  did  provide  some  additional  noise  reduction.* 

The  minimum  gradient  threshold  determines  the  sensitivity  of  the  pre- 
processor. As  it  is  lowered,  the  number  of  subsets  increases;  i.e.,  fainter 
edges  are  allowed  to  come  through.  Thus,  as  the  threshold  is  reduced fainter 

*As  described  earlier,  those  windows  that  were  shrunk  were  consequently 
„ , being  additionally  filtered,  as  well  as  reduced  In  resolution. 
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targets  will  be  detected.  However,  since  the  amount  of  clutter  and  noise  data 
is  also  increasing,  the  ultimate  false  alarm  rate  will  be  higher. 

To  provide  the  greatest  number  of  training  patterns  possible  for  feature 
analysis,  including  those  of  faint  targets,  a low  minimum  gradient  threshold 
was  desirable.  The  very  lowest  that  had  been  run  was  2.0*.  However,  that 
setting  provided  too  much  background  detail,  which  would  interfere  with  the 
computation  of  the  recognition  features  for  training.  Therefore,  a value  of 
2.5  was  chosen.  Subsequent  training  and  test  runs  were  made  at  that  threshold 
level. 


The  entire  training  set  of  windows  were  then  preprocessed  and  the  re- 
sults saved  on  magnetic  tape.  From  a previous  program  (Ref.  3)  a set  of 
recognition  features  had  been  developed  for  a 4 class  environment  (tank,  jeep, 
truck,  and  personnel).  Since  this  software  already  existed,  an  initial  trial 
with  these  features  was  attempted.  Specifically,  the  training  set  was  pro- 
cessed through  final  classification  using  the  existing  program.  Since  only 
one  half  the  target  types  were  the  same  and  one  recognition  feature  was  un- 
available (range),  the  actual  classification  results  were  ignored.  However, 
the  values  of  the  recognition  features  that  were  calculated  and  printed  out 
were  tabulated  for  each  target  sample.  Scatterplots  of  these  features  were 
then  made. 


Experience  from  previous  programs  showed  that  the  usual  statistical 
measures  such  as  means  and  variances  can  frequently  be  misleading  because 
the  distributions  are  often  multimodal.  Different  target  viewing  angles, 
resolutions,  etc.  yield  different  modes.  Analysis  using  scatterplots  proved 
to  be  the  most  effective  method  to  quickly  determine  separability  of  the  classes. 
*That  is,  2.0  out  of  32  possible  gray  levels. 
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The  scatterplots  suggested  that  the  existing  features  Aspect,  AR,  NFP, 
and  Closure  should  be  retained.  The  features  Khoriz,  and  Number  of  Quadrants 
Active  should  be  modified.  The  remainder  should  be  dropped  and  new  features 
added . 


Calcomp  plots  had  been  made  of  the  preprocessor  outputs  for  all  the 
windows.  Investigations  of  the  Calcomp  slots  suggested  some  new  trial  features. 
These  were  programmed  into  the  final  processor  simulation,  and  the  training 
windows  were  rerun.  Tables  and  scatterplots  were  then  made  of  the  new  and 
modified  features.  An  analysis  of  the  results  indicated  that  some  further 
modifications  of  the  new  features  were  needed. 

After  re-programming  the  modifications  the  training  set  was  again  rerun 
through  the  final  processor  simulation.  Scatterplots  of  the  new  features 
were  made.  The  plots  suggested  that  the  classes  were  not  linearly  separable. 
Therefore  any  training  algorithm  that  did  not  converge  unless  there  was 
separability  should  not  be  used. 

As  in  previous  studies,  the  number  of  samples  for  trainxng  is  much  too 
small  to  try  parametric  methods  of  designing  a classifier,  even  if  a distri- 
bution could  be  assumed.  Nonparametric  classifiers  that  require  storage  and 
searches  of  templates  or  sample  patterns  (e.g.,  k-nearest  neighbor  algorithms) 
are  either  too  time  consuming  for  real-time  data  rates  or  too  limited  in  the 
number  of  models  to  handle  all  the  variations  in  aspect  angle,  etc. 

An  adaptive  training  algorithm  such  as  the  sum-line  algorithm  that  had 
been  programmed  in-house  would  be  appropriate  for  a classifier.  However, 
time  did  not  permit  experimentation  with  it  under  a multi-class  condition. 

To  expedite  estimates  of  system  performance,  a two-layer  classifier  (Ref.  4) 
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was  chosen.  Subdecisions  are  made  on  the  basis  of  deterministically  designed 
boundaries  from  the  scatterplots  and  the  special  features  KHOLE  and  LONTOP. 

The  subdecisions  are  then  used  in  making  a final  decision. 

Consequently,  training  was  done  by  drawing  decision  boundaries  from  the 
scatterplots  that  would  minimize  the  final  error-rate  and  still  provide 
reasonable  extrapolative  performance.  The  scatterplots  of  the  training  set 
data  (target  windows)  are  shown  in  Figures  3-6(a),  (b) , and  (c) . The  decision 
boundaries  have  also  been  drawn  in.  Note  that  the  region  NON-TRUCK  actually 
means  APC  and  TANK.  Linear  boundaries  are  used  because  they  are  simple  to 
implement  in  software  (e.g.,  in  a y-Processor)  and  are  computationally  fast. 

The  final  boundaries  for  the  various  features  are  shown  in  Figure  2-11 
of  Section  2.2.  The  final  decision  of  the  target  class  is  made  by  taking  a 
majority  vote  of  the  outcome  of  the  individual  boundary  sets.  In  case  of  a 
tie,  additional  rules  were  developed  from  investigations  of  the  scatterplots. 
The  final  decision  logic  is  also  described  in  Section  2.  A tabulation  of  the 
features,  grouped  by  target  class,  suggested  several  additional  criteria  to 
separate  the  target  features  from  non-target  (or  false  alarm)  features.  These 
tests  are  shown  in  Table  2-1. 

3.5  Scoring  of  Training  Set 

Upon  completion  of  the  classification  algorithm  design,  the  training 
samples  were  processed  and  scored.  Table  3-IV  gives  the  results.  The  scores 
for  the  previously  keyed  degraded  samples  are  separated  from  those  of  the 
remaining  samples.  An  denotes  the  degraded  sample  scores.  The  leftmost 
column  shows  the  number  of  target  samples  that  were  processed  in  this  training 
set.  Referring  back  to  Figure  3-5,  the  number  of  targets  at  the  output  of 
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the  preprocessor  ("Score  #1"  arrow)  are  tabulated  in  the  second  column  of 
Table  3-IV.  It  is  at  this  point  that  the  target  has  been  initially  detected. 

The  final  processor  has  two  stages  of  false  alarm  screening.  The 
number  of  targets  remaining  after  the  first  stage  (see  Figure  3-5,  "Score 
#2"  arrow)  are  listed  in  the  third  column  of  Table  3-IV.  Those  targets 
remaining  after  the  "false  alarm  closure  test"  ("Score  ??3"  arrow)  are  counted 
and  given  in  the  "Number  Remaining  Thru  Closure"  column.  This  completes  the 
false  alarm  screening.  All  remaining  objects  are  now  assigned  to  a target 
class  by  the  classification  logic.  Those  targets  that  are  correctly  classi- 
fied are  counted  ("Score  #4)  and  listed  in  column  5 of  the  Table.  A special 
count  was  also  taken  at  the  "Score  It 4"  location.  The  computer  simulation 
actually  provided  classification  of  all  detected  targets  (i.e.,  bypassing 
false  alarm  screening) . Scoring  the  classification  of  all  detected  targets 
will  give  a better  estimate  of  how  well  the  classification  algorithm,  itself, 
is  performing,  regardless  of  the  screening  performance.  The  righthand  column 
of  Table  3-IV  gives  this  count. 

A look  at  the  data  in  Table  3-IV  shows  that  the  detection  count  is  high, 
decreasing  somewhat  through  the  false  alarm  screening  stages.  It  is  apparent 
that  the  degraded  samples  do  not  perform  nearly  as  well  as  the  acceptable 
samples.  While  the  closure  test  and  classification  stage  take  a heavy  toll 
on  the  degraded  samples,  the  acceptable  samples  do  very  well.  Recalling  the 
tremendous  distortions,  etc.  of  the  degraded  images,  it  is  not  at  all  surprising 
that  they  do  not  perform  well. 

The  training  results  are  shown  in  terms  of  performance  percentages  in 
Tabi'e  3-V.  The  'Combined  Samples"  column  contains  the  score  for  the  degraded 
and  the  acceptable  samples  combined  together.  Definitions  for  the  different 
percentages  is  given  below. 
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TABLE  3-  V TRAINING  SET  RESULTS  - PERCENTAGES 
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DETECTION:  This  is  the  percentage  of  ALL  target  samples  that 

were  detected  by  the  preprocessor,  i.e.,  Score  #1/ 
Score  #0  of  Figure  3-5. 

DETECTED  AND  SCREENED:  This  is  the  percentage  of  ALL  target 

samples  that  remained  through  false  alarm  screening, 
i.e..  Score  If  3/Score  #0  of  Figure  3-5.  These 
remaining  samples  will  all  be  next  assigned  one  of 
the  target  classes.  From  an  operational  aspect  and  a 
human  factors  aspect,  it  is  this  score  that  is  often 
termed  "detection". 


CLASSIFICATION  OF  DETECTED  AND  SCREENED  SAMPLES:  This  is  the 

percentage  of  the  above  target  samples  that  were  correctly 
classified,  i.e.,  Score  #4/Score  if3  of  Figure  3-5. 

DETECTED,  SCREENED,  AND  CLASSIFIED:  This  is  the  percentage  of  ALL 

target  samples  that  were  detected,  screened,  and  correctly 
classified,  i.e..  Score  //4/Score  //0. 

CLASSIFIER  PERFORMANCE:  As  described  earlier,  a special  count  at 

location  "Score  //4"  was  taken  to  provide  a performance 
estimate  of  the  classification  algorithms;  independent 
of  the  false  alarm  screening.  Specifically,  all  targets 
at  location  "Score  If  1"  were  run  through  the  classification 
logic.  We  thus  have  the  performance  estimate: 

Special  Score  //4/Score  Ifl. 

Referring  back  to  the  data  in  Table  3-V,  it  is  verified  that  the 
detection  rate  is  quite  high,  especially  for  the  acceptable  sample  category. 
The  Detected  and  Screened  score  is  also  good  for  the  acceptable  samples. 
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The  degraded  samples  are  too  broken  up  by  distortion  to  effectively  pass 
through  the  screening  stage,  and  therefore  pull  down  the  average  score. 
Classification  of  the  Detected  and  Screened  targets  is  good.  Even  the  de- 
graded samples  have  a Passable  score,  nearly  twice  as  good  as  random  chance 
(33%  for  three  equal  target  classes) . 

The  Detected,  Screened,  and  Classified  rate  is  the  product  of  the  two 
scores  above  it.  So  naturally,  it  is  lower  than  either  score.  It  is  evident 
that  the  lower  performance  of  the  degraded  samples  pulls  down  the  "combined 
samples"  score.  Otherwise,  the  acceptable  samples  perform  well.  Finally, 

the  Classifier  Performance  percentages  show  creditable  performance,  even  on 
the  degraded  samples. 


In  addition  to  scoring  the  performance  on  target  samples,  several  non- 
target  images  were  included  in  the  training  set,  fcr  false  alarm  estimates. 

These  windows  were  specifically  chosen  to  include  many  target-like  objects, 
more  than  an  average  scene  would  contain.  This  helped  to  derive  more  effective 
false  alarm  screening  criteria  in  the  training  process. 

The  non-target  windows  were  processed  along  with  the  target  samples  and 
scored.  Initially,  14  out  of  26  windows  had  false  alarms.  However,  a detailed 
examination  of  the  results  showed  that  6 of  the  false  alarms  were  from  extraneous 
image  data,  not  a part  of  the  FLIR  scene.  One  source  was  the  alphanumeric 
characters  superimposed  on  the  video.  Other  extraneous  sources  were  black, 
and  white  horizontal  lines  through  the  image.  These  are  not  part  of  the  FLIR 
video,  but  are  from  digitizing  or  magnetic  tape  errors.  Figure  3-7  shows  an 
example  of  a white  line  through  the  image.  The  upper  picture  is  a playback 
of  image  J-9  on  Tape  I-J.  The  lower  picture  is  a playback  of  J-9  on  Tape 
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J-K;  it  does  not  contain  the  white  line  even  though  it's  the  same  scene.  The 
last  extraneous  causes  of  false  alarms  were  the  cursor  and  horizon  line.  These 
also  would  not  normally  be  a part  of  the  video  sent  to  a processor. 

Therefore,  false  alarms  caused  by  these  sources  were  henceforth  excluded. 
That  leaves  8 out  of  26  windows  with  false  alarms,  or  31%.  As  indicated  pre- 
viously, an  average  scene  would  not  contain  as  many  targetlike  objects  over 
the  whole  f ieJd-of~view.  Thus  the  average  rate  would  be  lower. 

In  addition,  a higher  preprocessor  minimum  gradient  setting  would  reduce 
false  alarms.  A sensitive  setting  of  2.5  had  been  used  to  provide  a greater 

number  of  target  patterns  for  training  purposes.  For  comparison,  61  non- 

/ 

target  windows  (including  the  previous  26) ^were  processed  at  a preprocessor 
gradient  level  of  4.0.  The  false  alarm  r/xte  then  jumped  down  to  8/61  = 13%. 

Time  did  not  permit  re-processing  the  target  windows  to  estimate  the  natural 

/ 

drop  in  detection  or  recognition  rates/  However,  experience  on  a similar 

/ 

previous  study  indicated  that  the  falpe  alarm  rate  dropped  much  faster  than 

/ 

the  detection,  or  recognition  rates  i or  an  Increase  in  gradient  setting. 

/ 

/ 

/ 

A second  control  over  the  f^Lse  alarm  rate  is  in  the  final  processor  - 
the  false  alarm  closure  test.  A/variation  of  this  parameter  will  be  described 
in  the  next  section.  / 

3.6  Test  Set  Performance 

Following  the  conclusion  of  the  training  phase,  the  test  samples  were 
processed  using  the  thresholds  and  algorithms  established  during  training. 

The  same  types  of  scores  were  then  counted.  Table  3-VI  presents  the  raw 
data  for  the  test  set.  As  before,  the  degraded  images  have  been  separated 
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TABLE  3-  VI  TEST  SET  RESULTS  - RAW  SCORES 
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and  marked  by  an  The  columns  are  the  same  as  used  in  Table  3-IV.  The 

results  are  more  easily  viewed  as  percentages,  given  in  Table  3-VII.  The 
definitions  of  the  various  scores  are  also  the  same  as  detailed  earlier. 
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The  Detection  rate  shown  in  the  table  is  again  excellent.  The  detected 
and  Screened  rate  is  good  for  the  acceptable  samples,  but  is  inferior  for  the 
degraded  samples.  Similar  results  were  experienced  with  the  training  set. 

Classification  of  Detected  and  Screened  samples  was  lower  than  the  training 
set,  although  still  twice  as  high  as  random  chance.  This  would  indicate  that 
additional  samples  should  be  used  in  the  training  set  to  derive  more  general 
classification  boundaries.  A look  back  at  the  raw  data  shows  that  the  truck 
class  was  the  main  cause  of  the  lower  score.  An  examination  of  the  scatter- 
plots  for  the  truck  class  point  out  that  the  number  of  truck  samples  in  both 
training  and  test  is  small.  Therefore  the  full  spread  of  their  probable 
feature  distributions  was  not  well  represented.  A larger  training  set  should 
provide  better  results. 

The  "product"  score  - Detected,  Screened,  and  Classified  was  driven 
lower  than  the  training  scores  by  the  lower  classification  performance.  The 

Classification  Performance  was  lower  than  the  training  set.  The  difference  ; 

1 

arises  from  the  same  problem  as  encountered  by  the  Classification  of  Detected  ;j 

I 

and  Screened  Samples  score  and  discussed  above.  Additional  training  samples,  j 

l. 

especially  for  the  truck  class,  should  improve  the  performance.  J 


A summary  of  the  training  and  test  results,  by  window  number,  is  shown 
in  Table  VIII.  The  windows  that  were  considered  degraded  in  quality  are 

r 

indicated  in  the  fourth  column.  In  the  "P.esult"  column,  a "C"  indicates  that 
L the  target  wa3  detected,  screened,  and  correctly  classified.  If  the  target 
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TABLE  3-VIIT.  TRAINING  AND  TEST  RESULTS  - BY  WINDOW  (Continued) 


Window 

Number 

Training 

Test 

Degraded? 

Result 

60 

X 

X 

M 

6l 

X 

C 

6 2 

X 

X 

M 

63 

X 

M 

64 

X 

C 

65 

X 

X 

M 

67 

X 

M 

68 

X 

M 

69 

X 

X 

C 

73 

X 

X 

14 

74 

X 

D 

75 

X 

C 

76 

X 

C 

77 

X 

C 

79 

X 

D 

80 

X 

X 

M 

81 

X 

C 

82 

X 

C 

83 

X 

M 

84 

X 

C 

85 

X 

D 

86 

X 

C 

91 

X 

X 

D 

92 

X 

X 

M 

95 

X 

D 

95 

X 

D 

97 

X 

X 

M 

98 

X 

X 

M 

99 

X 

X 

C 

100 

X 

M 

101 

X 

X 

C 

102 

X 

M 

103 

X 

X 

M 

104 

X 

M 

105 

X 

M 

io6  ~ 

X 

X 

M 

107 

X 

X 

M 

108 

X 

X 

D 

109 

X 

X 

M 

111 

X 

X 

M 

112 

X 

X 

C 

113 

X 

X 

C 

114 

X 

X 

M 

115 

X 

X 

M 

116 

X 

X 

C 

117 

X 

X 

M 

118 

X 

X 

M 

i±9 

V 

A 

C 

3 20 

X 

X 

D 

121 

X 

X 

C 

122 

X 

M 

126 

X 

C 

127 

X 

r 

128 

X 

X 

130 

X 

X 

— M 
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TABLE  3-VIII.  TRAINING  AND  TEST  RESULTS  - BY  WINDOW  (Continued) 


Window 

Number 

Training 

Test 

Degraded? 

Result 

132 

X 

X 

c 

136 

X 

M 

139 

X 

X 

M 

145 

X 

X 

M 

148 

X 

X 

M 

151 

X 

C 

152 

X 

C 

153 

X 

C 

15^ 

X 

D 

155 

X 

C 

156 

X 

c 

157 

X 

c 

158 

X 

c 

159 

X 

c 

l£Q 

X 

c 

161 
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was  detected  and  screened,  but  misclassified  as  to  target  type,  a "D"  is 
given.  Missed  targets  are  keyed  by  an  "M". 

As  eluded  to  earlier,  for  false  alarm  testing,  non-target  windows  were 
lifted  from  the  original  digital  images  the  same  way  as  target  windows.  For 
the  test  phase,  213  were  processed.  After  excluding  the  falre  alarms  caused 
by  extraneous  sources,  42  windows  or  20Z  had  false  alarms.  Note  that  this  is 
lower  than  the  training  false  alarm  rate.  For  training,  windows  containing 
target-like  objects  were  selected,  but  the  test  windows  were  selected  to 
represent  areas  over  the  whole  field-of-view. 

The  false  alarm  rate  can  be  reduced  in  at  least  six  ways,  as  follows: 

1.  Reduce  jensitlvitv  threshold 

2.  Modify  classification  thresholds 

3 . Increase  prefiltering 

4.  Tighten  detection  criteria 

5.  Use  context  information 

6.  Use  range  information. 

As  shown  in  the  training  results,  the  rate  is  lowered  considerably  by 
increasing  the  minimum  gradient  setting  of  the  preprocessor  (reducing  the 
sensitivity).  An  increase  of  1.5  lowered  the  false  alarm  rate  by  18Z  in 
that  test.  (Par.  3.5).  Further  insight  into  the  effects  of  chrngiag  the 
setting  can  be  obtained  from  the  results  of  a similar  test  program  with 
FLIR  imagery,  done  for  the  Army  at  Frankford  Arsenal  (Ref.  3).  Several 
hundred  samples  were  run  at  three  different  gradient  or  sensitivity  levels. 
The  results  are  tabulated  below. 
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SENSITIVITY 

DETECTION 

DETECTED 
AND  SCREENED 

DETECTED,  SCREENED, 
CLASSIFIED 

FALSE  ALARM 
RATE 

Low  (4.0) 

87% 

67% 

51% 

2% 

Medium  (3.0) 

95% 

75% 

54% 

7% 

High  (2.0/ 

98% 

85% 

58% 

19% 

The  present  test  results,  run  at  a gradient  threshold  2.5,  are  close  to  those 
at  the  2.0  setting  above.  It  can  be  seen  that  reduction  of  sensi f i*i.y  may 
significantly  reduce  the  false  alarm  rate  while  only  slightly  reducing  the  final 
classification  rate. 

A second  optior  for  reducing  the  rate  is  in  the  final  processor.  If  the 
false  alarm  closure  criterion  is  increased,  the  false  alarm  rate  will  decrease. 
As  an  example,  if  the  minimum  acceptable  closure  is  changed  from  0.37  to  0.40, 
the  20%  false  alarm  rate  becomes  32  alarms  in  213  windows,  or  15%.  Additional 
study  would  be  needed  to  determine  how  the  detection  rate  is  affected  by  this 
criterion. 

A third  method  of  reducing  the  false  alarms  is  to  increase  the  amount  of 
prefiltering  of  the  data.  Either  defocusing-type  filtering  or  more  elaborate 
neighborhood  averaging  type  filtering  would  reduce  those  false  alarms  caused 
by  noise.  If  the  significant  target  features  are  not  obliterated  by  the 
filtering,  the  recognition  rate  may  be  maintained. 

False  alarms  could  also  be  reduced  by  using  only  blobs  as  "cues"  to 
locate  candidate  objects.  Long  subsets  would  not  be  used  as  cues.  For  this 
set  of  images,  the  false  alarm  count  would  drop  from  42  down  to  10  false 
alarms  out  of  213  windows,  or  5%.  Some  targets  would  also  be  lost,  but  a 
modification  of  the  blob  detector  stage  could  retrieve  a portion  of  them. 

Further  experiments  into  this  possibility  are  desirable. 
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False  alarms  might  also  be  reduced  through  the  use  of  texture  or  context 
information.  Texture  statistics  are  already  computed  in  the  preprocessor. 

Using  them  to  classify  the  terrain  was  initially  accomplished  in  the  Phase  I 
portion  of  the  Frankford  study.  Knowledge  of  the  terrain  type  and  use  of  other 
background  statistics  can  help  prevent  false  alarms.  In  the  present  program, 
texture  statistics  were  generated.  However,  training  and  test  efforts  to 
classify  the  terrain,  and  incorporation  of  the  data  into  the  decision  logic 
were  not  within  the  scope  of  the  program. 

Finally,  range  information  can  be  quite  useful  in  preventing  false  alarms. 
Future  sensor  systems  are  likely  to  have  available  ranging  devices  for  weapon 
delivery.  It  was  found  in  the  Frankford  study  that  the  use  of  range  information 
aids  in  rejecting  false  alarms,  as  well  as  increasing  target  classification 
ccuracy. 

Given  the  variety  of  possibilities  for  reducing  the  false  alarm  rate, 
reducing  the  initial  20%  rate  to,  say,  1%  is  not  unrealistic.  Since  each 
window  represents  1/80  of  the  field-of-view  area  (100  x 100  out  of  800  x 1024 
pixels),  there  would  be  then  0.80  false  alarms  per  frame,  or  one  alarm  per 
1 . 25  frames . 

> To  the  operator,  though,  the  effective  rate  would  be  lower.  Except 

✓ 

for  the  occasional  effects  of  noise,  new  false  alarms  would  ordinarily  be 
generated  o-  ly  as  new  scenes  are  covered  by  the  field-of-view.  But  the 
scene  only  slowly  changes  (over  several  seconds)  when  the  sensor  looks  out 
at  targets  at  long  range.  Therefore,  it  should  be  kept  in  mind  that  the 
false  alarm  rate  per  frame  will  apply  to  changes  of  scene  in  the  field-of- 
view,  in  the  system  application,  and  not  to  the  refresh  rate  of  the  sensor. 

The  operator  will  be  faced  with  one  probable  false  alarm  over  perhaps  5 to 
I 10  seconds,  based  or.  the  false  alarm  rate  noted  above. 

h 
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3.7  Discussion 
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At  this  point,  it  is  timely  to  note  two  important  points  about  the 
limited  number  of  target  samples.  First,  as  emphasized  by  the  problem  with 
the  truck  class,  on  a per  class  basis  the  number  of  samples  may  not  be 
sufficient  to  fully  represent  their  distributions  in  the  feature  space.  In 
general,  for  single  modal  distributions,  a minimum  training  set  should  contain 
at  least  10  samples  per  feature  per  class  to  provide  a suitable  estimate  of 
the  distribution.  (Although  in  practice,  that  is  often  difficult  to  achieve.) 
In  this  case,  there  are  10  different  features  employed  in  the  classification 
boundaries.  So  while  approximately  100  truck  samples  would  be  desirable, 
only  19  were  available  for  training.  This  naturally  creates  difficulties  in 
estimating  suitable  boundaries  for  adequate  performance  on  completely  new 
samples  (e.g.,  the  test  set). 

The  second  point  concerning  che  number  of  samples  is  the  confidence 
level.  The  statistical  nature  of  the  test  creates  some  uncertainty  about  the 
performance  estimates.  The  confidence  interval  expresses  how  much  confidence 
is  justified  in  the  sample  set.  For  example,  the  test  score  for  Detected, 
Screened,  and  Classified  samples  was  48%.  If  we  assume  that  the  outcome  of 
this  score  was  binomially  distributed*  - a yes  or  no  scoring,  then  the  95% 
confidence  interval  for  50  samples  is  33%  to  63%.  Additional  samples  would 
narrow  this  wide  range. 

For  a more  "averaged"  look  at  the  performance  estimates,  the  training 
and  test  results  are  combined  in  Table  3-IX.  The  degraded  images  have  been 

*A  questionable  assumption  in  view  of  the  complexity  of  the  features  and 
classification  algorithm.  So  this  is  likely  to  be  an  optimistic  estimate. 
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excluded  since  they  are  probably  unrealistic  in  terms  of  pure  FLIR  video. 
Detection,  and  classification  are  both.  good.  The  "product”  score  Detected, 
Screened,  and  Classified  is  also  acceptable. 


To  put  the  scores  into  perspective,  consider  the  performance  of  a human 
interpreter.  The  Detected  and  Screened  score,  as  previously  noted,  is  equiva- 
lent to  what  a human  observer  would  call  "detection",  and  the  classification 
of  Detected  and  Screened  score  is  equivalent  to  an  observers  "recognition" 
rate.  Mr.  John  Dehne  of  NVL  indicates  that  for  this  type  of  imagery,  an 
observer’s  detection  rate  is  approximately  90%  and  the  recognition  rate  is 
about  50%,  under  ideal  conditions. 

Additionally,  an  adhoc  experiment  was  performed  in-house  on  this 
particular  set  of  imagery.  The  experiment  was  carried  out  with  a volunteer 
subject*  who  had  not  studied  or  viewed  separately  the  training  and  test  sets. 
Three  different  sets  of  50  x 50  windows  (gray  level  playbacks)  were  viewed 
and  classified  by  the  subject.  At  the  end  of  each  of  the  three  sets,  a 
score  was  made,  allowing  some  feedback  to  the  subject.  The  totals  of  the 
three  tests  are  shown  in  Table  3-X.  Even  here,  the  truck  class  was  inferioi . 
The  average  score  was  only  60%,  indicating  difficulty  with  this  imagery  base. 
This  score  is  the  "equivalent"  of  the  Classification  of  Detected  and  Screened 
Samples  because  the  subject  knew  that  every  sample  viewed  did  contain  a target. 
The  false  alarm  or  detection  rate  was  not  investigated.  It  is  not  intended 
to  imply  that  the  machine  classifier  is  better  than  the  human  - the  conditions 
were  not  identical  and  the  number  of  samples  too  limited.  However,  it  is 
gratifying  to  note  that  they  are  not  highly  different. 
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4.0  CONCLUSIONS  AND  RECOMMENDATIONS 

The  objective  of  this  study  was  to  estimate  the  performance  of  the  digital 
image  processing  techniques  that  have  been  developed  on  imagery  from  an  875- 
line  TV  compatible  FLIR  sensor.  Conclusions  derived  from  this  study  are  dis- 
cussed below.  Iu  addition,  recommendations  are  made  with  regard  to  the  emphasis 
of  future  efforts. 

From  the  simulation  test  described  in  Section  3.0,  the  following  con- 
clusions are  drawn: 

1.  Initial  acquisition  of  target  material  is  in  the  90%  range. 

However,  rejection  of  some  targets  is  necessary  to  limit  the  rate 
of  false  alarms.  The  best  compromise  depends  upon  mission 
requirements . 

2.  Classification  performance  was  generally  in  the  60-80%  range. 

Specific  performance  depended  upon  the  size  of  the  training  set 
and  the  quality  of  the  images. 

3.  Extraneous  sources  of  degradation  of  the  imagery  made  testing 
and  evaluation  more  difficult.  In  practice,  it  is  assumed  that 
most  of  these  sources  would  not  be  present  in  the  FLIR  video. 

4.  In  view  of  the  large  number  of  features  needed  to  separate  the 
target  classes,  the  size  of  the  data  base  was  very  limited.  This 
made  extrapolation  from  the  training  samples  to  the  test  samples 
a precarious  trial  for  the  classifier. 
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